Introduction

In the period of 1991 to 2017, housing quality in New York has improved dramatically; however, some sectors of the housing stock continue to face poor conditions and some specific maintenance deficiencies continue to show higher prevalence. In this project, we develop an index that presents poor qualtity of housing in New York by measuring the physical deficiencies to show how the prevalence of these issues has shifted over time. We use data from the New York City Housing and Vacancy Survey (NYCHVS)1 and follow a similar procedure similiar to the one found in the American Housing Survey: PQI2

This graphic shows the distirbution of poor housing conditions, but fails to measure interactions between said conditions. We want to measure the quality of individual units by constructing an index. This is count data so we chose a bar chart over a pie chart given the number of categories.

Background Methodology

The index measures wighted sums of 22 variables that the authors chose. The selected variables were chosen if the authors agreed they described poor housing conditions. The index is not exhaustive, as the author’s decided to build and index that is robust with respect to time. Variables that were only collected for a small number of years were disregarded to avoid inflating values during year for which a unique variable has been. Potentially more data could be collected/used to better suit our purpose.

The authors chose not to include financial data, such as rent or income, in the index. This is largerly do the complexity of implementing such a measure. Particularly poor hosuing condition may be a good predictior of income, but the other direction is not neccesarily true. This is better visualised in (Fig 5) below.

The index below is an ordinal measurse. That is, the higher the score the more indicative of poor housing conditions. Some items in the index have been ranked by the authors accordingly. However, due to the qualitative nature of this scoring the authors chose to only rank a few variables and in other cases deafult to a score of two. Further analysis for choosing optimal weights is reccomended.

Item Description NYCHVS Variable Score
1 Exterior Walls: Missing brick, sliding or other d1 2
2 Exterior Walls: Sloping or bulgin walls d2 2
3 Exterior walls: Major Cracks d3 2
4 Exterior Walls: Loose or hanging corvice, roof, etc. d4 2
5 Interior Walls: Cracks or holes 36a 2
6 Interior Walls: Broken plaster or peeling paint 37a 2
7 Broken or missing windows e1 5
8 Rotten or loose windows e2 2
9 Boarded up windows e3 3
10 Sagging or sloping floors g1 2
11 Slanted/shifted doorsills or frames g2 2
12 Deep wear in floor causing depressions g3 2
13 Holes or missing flooring g4 2
14 Stairs: Loose, broken, or missing stair f1 2
15 Stairs: Loose, broken, or missing setps f2 2
16 No interior steps or stairways f4 2
17 No exterior steps or stairways f5 2
18 Number of heating equipment breakdowns 32b 2 per break down
19 Kitchen facilities fucntioning 26c 3 if no, 5 if no kitchen facilities
20 Toilet Breakdowns 25c 3 if any, 5 if no toliet or plumbing
21 Presence of mice or rats 35a 3
22 Water Leakage 38a 3

Index Visualization

Figure 2 shows the poor quality index scores for the 156,230 occupied units in the New York Housing Dataset from 1991 to 2017. The frequency distribution is skewed to the right. Overall, fourty five percent of the units were scored 0. The highest score was in 1993 with 54 points. 2008 had the highest percent (64%) of units that has 0 poor quality scores. Since we are showing a distibutuon we could have use histograms but chose to use a line graph instead to plot all years simultaneously and distinguish between them. We might have condiered animation but decided against it as the animation may have made the graphic ovewhelming. Instead the graph is interactive with a subsetable legend and tooltip.

Figure 3 shows percent the percent of occupied units with poor quality scores. Over the period of 1991 to 2017, most of the units has poor quality scores between 1 and 10 points; very little units that has the poor quality scroes over 20 points. For this data we chose bar plots over tables although either would have sufficed. The bar plots have the advantage that trends of time are more easily visualized, but the smalll bars in the final plot are hard to distinguish. A tooltip is available as a result.

Figure 4 tracks trends in poor quality index scores during the period of 1991 to 2017. We decided to report the means, medians, 75th percentiles, 95th percentiles, and 99th percentiles. In most of the years, the median had the poor quality scores of 0. The mean ranged from 4.0 in 1991 to 2.5 in 2017. The 99th percentiles clearly show the improvement of housing in New York( from 25 poor quality points in 1991 to 18 poor quality points in 2017). Line graphs were chosen to represent multiple trends over time. Since the lines are on a similiar scale they are more preferable then stacked bar plots which would require interpretting inter bar widths. The 6 trends shown are a standard percentile partition compared to the mean.

Household Income Visualization

We also have data pertaining to household income. It is a common perception that income and quality of life are inherently linked. That is, low income earners tend live in low quality housing. It would be interesitng to see if income had an upward trend during that time. Househould income has a ceiling in the data and any income greater than 10 milliion will be capped.

Figrue 5: Household Income over time

## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine

Household Income is slowly increasing over time, but this is expected when considering inflation. Further this is calculating percentiles over the entirety of NYC and we may be missing out on a spatial component of the data. This graphic ins meant to show trends over time which a line graph does well. We also speereated the prcentitles into two graphs as the growath rates are on vcomepletely different scales.

Figure 7 shows the raw relationship between PQI and Household Income. It is meant to show how noisy the data is. Particularly noteworthy is the millionaires living in units with high indexe values. The precieved relationship is spurious at best here. However if we note the prespective, a high PQI does seem to be a good predictor of lower income. The plot itself has a quite a few flaws. For instance the density of point is not clear. Almost half of the data lies on the line PQI = 0. However the plot is not meant for inference, but to point out problems with the inference itself.

Figure 8: Average Household Income and Index by Sub-borough for 2017

## OGR data source with driver: GeoJSON 
## Source: "C:\Users\PACMAN\OneDrive\School\By Year\Senior 18 -19\Spring 2019\Data Science\NY Housing\NYHousingDataCleaning\Community Districts.geojson", layer: "Community Districts"
## with 71 features
## It has 3 fields

Figure 8 shows a map of New York City Sub-boroughs and shades the regions by mean household income and index score. The darker shade of blue indicated lower mean income, and the darker shade of red indicated a lower hosuing quality. The plot shows what one might expect, i.e., neighborhoods with lower quality housing genrally correspond to a lower average household income. However, the value mean household income clearly does not predict housing quality indicating that it would not have been appropraite to include such value in the index without further considerations. The spatial plot gives a motivated context to the problem that scatterplots don’t, but scatter plots are employed later.

Relationship between household income and housing quality

We show a quantile relationship between the 5th percentile of household income and the 95th percentile of housing quality over neighborhoods(subborough). That is, neigborhoods with very poor houses are reasonbly correlated with the same neighborhood having very low income households, but this does not necesaril mean that low income families tend to live in poor quality housing or vice versa.

This shows the strongest relationship between the variables of interest. The axes have a very high cost of interprability. There is concern that the relationship here could be easily misinterpreted.

Figure 10: Quantile Relationship Over Time

This animation shows a clear trend, most neighborhoods poorest structures are improving and it’s poorest residents are earning more over time. However this is not true for all neighborhoods. We also have some semblance of a relationship here. This plot is perhaps the best presented here. The plotted axes have some interpretation challenges, but the plot clearly shows the discussed trends over time while maintining a clear relation between the variables.

References


  1. https://www.census.gov/programs-surveys/nychvs.html

  2. https://www.huduser.gov/publications/pdf/AHS_hsg.pdf